Load packages
library(tidyr)
library(dplyr)
library(tibble)
library(pillar)
library(stringr)
library(brms)
options(brms.backend = "cmdstanr", mc.cores = 2)
library(posterior)
options(pillar.negative = FALSE)
library(loo)
library(priorsense)
library(ggplot2)
library(bayesplot)
theme_set(bayesplot::theme_default(base_family = "sans"))
library(tidybayes)
library(ggdist)
library(patchwork)
library(RColorBrewer)
SEED <- 48927 # set random seed for reproducability
This notebook contains several examples of how to use Stan in R with brms. This notebook assumes basic knowledge of Bayesian inference and MCMC. The examples are related to Bayesian data analysis course.
Toy data with sequence of failures (0) and successes (1). We would like to learn about the unknown probability of success.
data_bern <- data.frame(y = c(1, 1, 1, 0, 1, 1, 1, 0, 1, 0))
As usual in case of generalizd linear models, (GLMs) brms defines the priors on the latent model parameters. With Bernoulli the default link function is logit, and thus the prior is set on logit(theta). As there are no covariates logit(theta)=Intercept. The brms default prior for Intercept is student_t(3, 0, 2.5), but we use student_t(7, 0, 1.5) which is close to logistic distribution, and thus makes the prior near-uniform for theta. We can simulate from these priors to check the implied prior on theta. We next compare the result to using normal(0, 1) prior on logit probability. We visualize the implied priors by sampling from the priors.
data.frame(theta = plogis(ggdist::rstudent_t(n=20000, df=3, mu=0, sigma=2.5))) |>
mcmc_hist() +
xlim(c(0,1)) +
labs(title='Default brms student_t(3, 0, 2.5) prior on Intercept')
data.frame(theta = plogis(ggdist::rstudent_t(n=20000, df=7, mu=0, sigma=1.5))) |>
mcmc_hist() +
xlim(c(0,1)) +
labs(title='student_t(7, 0, 1.5) prior on Intercept')
Almost uniform prior on theta could be obtained also with normal(0,1.5)
data.frame(theta = plogis(rnorm(n=20000, mean=0, sd=1.5))) |>
mcmc_hist() +
xlim(c(0,1)) +
labs(title='normal(0, 1.5) prior on Intercept')
Formula y ~ 1 corresponds to a model $() =
#\alpha\times 1 = \alpha$. `brms? denotes the $\alpha$ as `Intercept`.
fit_bern <- brm(y ~ 1, family = bernoulli(), data = data_bern,
prior = prior(student_t(7, 0, 1.5), class='Intercept'),
seed = SEED, refresh = 0)
Check the summary of the posterior and inference diagnostics.
fit_bern
Family: bernoulli
Links: mu = logit
Formula: y ~ 1
Data: data_bern (Number of observations: 10)
Draws: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
total post-warmup draws = 4000
Population-Level Effects:
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
Intercept 0.76 0.64 -0.43 2.09 1.00 1734 1726
Draws were sampled using sample(hmc). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).
Extract the posterior draws
draws <- as_draws_df(fit_bern)
We can get summary information using summarise_draws()
draws |>
subset_draws(variable='b_Intercept') |>
summarise_draws()
# A tibble: 1 × 10
variable mean median sd mad q5 q95 rhat ess_bulk ess_tail
<chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 b_Intercept 0.763 0.746 0.641 0.636 -0.242 1.90 1.00 1734. 1726.
We can compute the probability of success by using plogis which is equal to inverse-logit function
draws <- draws |>
mutate_variables(theta=plogis(b_Intercept))
Summary of theta by using summarise_draws()
draws |>
subset_draws(variable='theta') |>
summarise_draws()
# A tibble: 1 × 10
variable mean median sd mad q5 q95 rhat ess_bulk ess_tail
<chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 theta 0.668 0.678 0.130 0.134 0.440 0.870 1.00 1734. 1726.
Histogram of theta
mcmc_hist(draws, pars='theta') +
xlab('theta') +
xlim(c(0,1))
Prior and likelihood sensitivity plot shows posterior density estimate depending on amount of power-scaling. Overlapping line indicate low sensitivity and wider gaps between line indicate greater sensitivity.
theta <- draws |>
subset_draws(variable='theta')
powerscale_sequence(fit_bern, prediction = \(x, ...) theta) |>
powerscale_plot_dens(variables='theta') +
# switch rows and cols
facet_grid(rows=vars(.data$variable),
cols=vars(.data$component)) +
# cleaning
ggtitle(NULL,NULL) +
labs(x='theta', y=NULL) +
scale_y_continuous(breaks=NULL) +
theme(axis.line.y=element_blank(),
strip.text.y=element_blank()) +
xlim(c(0,1))
We can summarise the prior and likelihood sensitivity using cumulative Jensen-Shannon distance.
powerscale_sensitivity(fit_bern, prediction = \(x, ...) theta)$sensitivity |>
filter(variable=='theta') |>
mutate(across(where(is.double), ~num(.x, digits=2)))
# A tibble: 1 × 4
variable prior likelihood diagnosis
<chr> <num:.2!> <num:.2!> <chr>
1 theta 0.04 0.11 -
Instead of sequence of 0’s and 1’s, we can summarize the data with the number of trials and the number successes and use Binomial model. The prior is specified in the ‘latent space’. The actual probability of success, theta = plogis(alpha), where plogis is the inverse of the logistic function.
Binomial model with the same data and prior
data_bin <- data.frame(N = c(10), y = c(7))
Formula y | trials(N) ~ 1 corresponds to a model \(\mathrm{logit}(\theta) = \alpha\), and the number of trials for each observation is provided by | trials(N)
fit_bin <- brm(y | trials(N) ~ 1, family = binomial(), data = data_bin,
prior = prior(student_t(7, 0,1.5), class='Intercept'),
seed = SEED, refresh = 0)
Check the summary of the posterior and inference diagnostics.
fit_bin
Family: binomial
Links: mu = logit
Formula: y | trials(N) ~ 1
Data: data_bin (Number of observations: 1)
Draws: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
total post-warmup draws = 4000
Population-Level Effects:
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
Intercept 0.75 0.64 -0.47 2.07 1.00 1699 1508
Draws were sampled using sample(hmc). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).
The diagnostic indicates prior-data conflict, that is, both prior and likelihood are informative. If there is true strong prior information that would justify the normal(0,1) prior, then this is fine, but otherwise more thinking is required (goal is not adjust prior to remove diagnostic warnings withoyt thinking). In this toy example, we proceed with this prior.
Extract the posterior draws
draws <- as_draws_df(fit_bin)
We can get summary information using summarise_draws()
draws |>
subset_draws(variable='b_Intercept') |>
summarise_draws()
# A tibble: 1 × 10
variable mean median sd mad q5 q95 rhat ess_bulk ess_tail
<chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 b_Intercept 0.749 0.741 0.635 0.620 -0.266 1.87 1.00 1699. 1508.
We can compute the probability of success by using plogis which is equal to inverse-logit function
draws <- draws |>
mutate_variables(theta=plogis(b_Intercept))
Summary of theta by using summarise_draws()
draws |>
subset_draws(variable='theta') |>
summarise_draws()
# A tibble: 1 × 10
variable mean median sd mad q5 q95 rhat ess_bulk ess_tail
<chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 theta 0.665 0.677 0.130 0.131 0.434 0.866 1.00 1699. 1508.
Histogram of theta
mcmc_hist(draws, pars='theta') +
xlab('theta') +
xlim(c(0,1))
Re-run the model with a new data dataset without recompiling
data_bin <- data.frame(N = c(5), y = c(4))
fit_bin <- update(fit_bin, newdata = data_bin)
Check the summary of the posterior and inference diagnostics.
fit_bin
Family: binomial
Links: mu = logit
Formula: y | trials(N) ~ 1
Data: data_bin (Number of observations: 1)
Draws: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
total post-warmup draws = 4000
Population-Level Effects:
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
Intercept 1.05 0.92 -0.61 2.98 1.00 1470 2036
Draws were sampled using sample(hmc). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).
Extract the posterior draws
draws <- as_draws_df(fit_bin)
We can get summary information using summarise_draws()
draws |>
subset_draws(variable='b_Intercept') |>
summarise_draws()
# A tibble: 1 × 10
variable mean median sd mad q5 q95 rhat ess_bulk ess_tail
<chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 b_Intercept 1.05 1.02 0.920 0.890 -0.399 2.65 1.00 1470. 2036.
We can compute the probability of success by using plogis which is equal to inverse-logit function
draws <- draws |>
mutate_variables(theta=plogis(b_Intercept))
Summary of theta by using summarise_draws()
draws |>
subset_draws(variable='theta') |>
summarise_draws()
# A tibble: 1 × 10
variable mean median sd mad q5 q95 rhat ess_bulk ess_tail
<chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 theta 0.710 0.735 0.162 0.164 0.402 0.934 1.00 1470. 2036.
Histogram of theta
mcmc_hist(draws, pars='theta') +
xlab('theta') +
xlim(c(0,1))
An experiment was performed to estimate the effect of beta-blockers on mortality of cardiac patients. A group of patients were randomly assigned to treatment and control groups:
Data, where grp2 is an indicator variable defined as a factor type, which is useful for categorical variables.
data_bin2 <- data.frame(N = c(674, 680),
y = c(39,22),
grp2 = factor(c('control','treatment')))
To analyse whether the treatment is useful, we can use Binomial model for both groups and compute odds-ratio. To recreate the model as two independent (separate) binomial models, we use formula y | trials(N) ~ 0 + grp2, which corresponds to a model \(\mathrm{logit}(\theta) = \alpha \times 0 + \beta_\mathrm{control}\times x_\mathrm{control} + \beta_\mathrm{treatment}\times x_\mathrm{treatment} = \beta_\mathrm{control}\times x_\mathrm{control} + \beta_\mathrm{treatment}\times x_\mathrm{treatment}\), where \(x_\mathrm{control}\) is a vector with 1 for control and 0 for treatment, and \(x_\mathrm{treatemnt}\) is a vector with 1 for treatemnt and 0 for control. As only of the vectors have 1, this corresponds to separate models \(\mathrm{logit}(\theta_\mathrm{control}) = \beta_\mathrm{control}\) and \(\mathrm{logit}(\theta_\mathrm{treatment}) = \beta_\mathrm{treatment}\). We can provide the same prior for all \(\beta\)’s by setting the prior with class='b'. With prior student_t(7, 0,1.5), both \(\beta\)’s are shrunk towards 0, but independently.
fit_bin2 <- brm(y | trials(N) ~ 0 + grp2, family = binomial(), data = data_bin2,
prior = prior(student_t(7, 0,1.5), class='b'),
seed = SEED, refresh = 0)
Check the summary of the posterior and inference diagnostics. brms is using the first factor level control as the baseline and thus reports the coefficient (population-level effect) for treatment (shown s grp2treatment) Check the summary of the posterior and inference diagnostics. With ~ 0 + grp2 there is no Intercept and and are presented as grp2control and grp2treatment.
fit_bin2
Family: binomial
Links: mu = logit
Formula: y | trials(N) ~ 0 + grp2
Data: data_bin2 (Number of observations: 2)
Draws: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
total post-warmup draws = 4000
Population-Level Effects:
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
grp2control -2.78 0.17 -3.13 -2.45 1.00 2406 2661
grp2treatment -3.38 0.21 -3.80 -2.99 1.00 3386 2390
Draws were sampled using sample(hmc). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).
Compute theta for each group and the odds-ratio. brms uses bariable names b_grp2control and b_grp2treatment for \(\beta_\mathrm{control}\) and \(\beta_\mathrm{treatment}\) respectively.
draws_bin2 <- as_draws_df(fit_bin2) |>
mutate(theta_control = plogis(b_grp2control),
theta_treatment = plogis(b_grp2treatment),
oddsratio = (theta_treatment/(1-theta_treatment))/(theta_control/(1-theta_control)))
Plot oddsratio
mcmc_hist(draws_bin2, pars='oddsratio') +
scale_x_continuous(breaks=seq(0.2,1.6,by=0.2))+
geom_vline(xintercept=1, linetype='dashed')
Probability that the oddsratio<1
draws_bin2 |>
mutate(poddsratio = oddsratio<1) |>
subset(variable='poddsratio') |>
summarise_draws(mean, mcse_mean)
# A tibble: 1 × 3
variable mean mcse_mean
<chr> <dbl> <dbl>
1 poddsratio 0.984 0.00234
oddsratio 95% posterior interval
draws_bin2 |>
subset(variable='oddsratio') |>
summarise_draws(~quantile(.x, probs = c(0.025, 0.975)), ~mcse_quantile(.x, probs = c(0.025, 0.975)))
# A tibble: 1 × 5
variable `2.5%` `97.5%` mcse_q2.5 mcse_q97.5
<chr> <dbl> <dbl> <dbl> <dbl>
1 oddsratio 0.320 0.928 0.00381 0.0149
Make prior sensitivity analysis by power-scaling both prior and likelihood. Focus on oddsratio which is the quantity of interest. We see that the likelihood is much more informative than the prior, and we would expect to see a different posterior only with a highly informative prior (possibly based on previous similar experiments).
oddsratio <- draws_bin2 |>
subset_draws(variable='oddsratio')
Prior and likelihood sensitivity plot shows posterior density estimate depending on amount of power-scaling. Overlapping line indicate low sensitivity and wider gaps between line indicate greater sensitivity.
powerscale_sequence(fit_bin2, prediction = \(x, ...) oddsratio) |>
powerscale_plot_dens(variables='oddsratio') +
# switch rows and cols
facet_grid(rows=vars(.data$variable),
cols=vars(.data$component)) +
# cleaning
ggtitle(NULL,NULL) +
labs(x='Odds-ratio', y=NULL) +
scale_y_continuous(breaks=NULL) +
theme(axis.line.y=element_blank(),
strip.text.y=element_blank()) +
# reference line
geom_vline(xintercept=1, linetype='dashed')
We can summarise the prior and likelihood sensitivity using cumulative Jensen-Shannon distance.
powerscale_sensitivity(fit_bin2, prediction = \(x, ...) oddsratio, num_args=list(digits=2)
)$sensitivity |>
filter(variable=='oddsratio') |>
mutate(across(where(is.double), ~num(.x, digits=2)))
# A tibble: 1 × 4
variable prior likelihood diagnosis
<chr> <num:.2!> <num:.2!> <chr>
1 oddsratio 0.01 0.14 -
Above we used formula y | trials(N) ~ 0 + grp2 to have separate model for control and treatment group. An alternative model y | trials(N) ~ grp2 which is equal to y | trials(N) ~ 1 + grp2, would correspond to a model $() = + x = + x. Now \(\alpha\) models the probability of death (via logistic link) in the control group and \(\alpha + \beta_\mathrm{treatment}\) models the probability of death (via logistic link) in the treatment group. Now the models for the groups are connected. Furthermore, if we set independent student_t(7, 0, 1.5) priors on \(\alpha\) and \(\beta_\mathrm{treatment}\), the implied priors on \(\theta_\mathrm{control}\) and \(\theta_\mathrm{treatment}\) are different. We can verify this with a prior simulation.
data.frame(theta_control = plogis(ggdist::rstudent_t(n=20000, df=7, mu=0, sigma=1.5))) |>
mcmc_hist() +
xlim(c(0,1)) +
labs(title='student_t(7, 0, 1.5) prior on Intercept') +
data.frame(theta_treatment = plogis(ggdist::rstudent_t(n=20000, df=7, mu=0, sigma=1.5))+
plogis(ggdist::rstudent_t(n=20000, df=7, mu=0, sigma=1.5))) |>
mcmc_hist() +
xlim(c(0,1)) +
labs(title='student_t(7, 0, 1.5) prior on Intercept and b_grp2treatment')
In this case, with relatively big treatment and control group, the likelihood is informative, and the difference between using y | trials(N) ~ 0 + grp2 or y | trials(N) ~ grp2 is negligible.
Third option would be a hierarchical model with formula y | trials(N) ~ 1 + (1 | grp2), which is equivalent to y | trials(N) ~ 1 + (1 | grp2), and corresponds to a model \(\mathrm{logit}(\theta) = \alpha \times 1 + \beta_\mathrm{control}\times x_\mathrm{control} + \beta_\mathrm{treatment}\times x_\mathrm{treatment}\), but now the prior on \(\beta_\mathrm{control}\) and \(\beta_\mathrm{treatment}\) is \(\mathrm{normal}(0, \sigma_\mathrm{grp})\). The default brms prior for \(\sigma_\mathrm{grp}\) is student_t(3, 0, 2.5). Now \(\alpha\) models the overall probablity of death (via logistic link), and \(\beta_\mathrm{control}\) and \(\beta_\mathrm{treatment}\) model the difference from that having the same prior. Prior for \(\beta_\mathrm{control}\) and \(\beta_\mathrm{treatment}\) includes unknown scale \(\sigma_\mathrm{grp}\). If the there is not difference between control and treatment groups, the posterior of \(\sigma_\mathrm{grp}\) has more mass near 0, and bigger the difference between control and treatment groups are, more mass there is away from 0. With just two groups, there is not much information about \(\sigma_\mathrm{grp}\), and unless there is a informative prior on \(\sigma_\mathrm{grp}\), two group hierarchical model is not that useful. Hierarchical models are more useful with more than two groups. In the following, we use the previously used student_t(7, 0,1.5) prior on intercept and the default brms prior student_t(3, 0, 2.5) on \(\sigma_\mathrm{grp}\).
fit_bin2 <- brm(y | trials(N) ~ 1 + (1 | grp2), family = binomial(), data = data_bin2,
prior = prior(student_t(7, 0,1.5), class='Intercept'),
seed = SEED, refresh = 0, control=list(adapt_delta=0.99))
Check the summary of the posterior and inference diagnostics. The summary reports that there are Group-Level Effects: ~grp2 with 2 levels (control and treatment), with sd(Intercept) denoting \(\sigma_\mathrm{grp}\). In addition, the summary lists Population-Level Effects: Intercept (\(\alpha\)) as in the prevous non-hierarchical models.
fit_bin2
Family: binomial
Links: mu = logit
Formula: y | trials(N) ~ 1 + (1 | grp2)
Data: data_bin2 (Number of observations: 2)
Draws: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
total post-warmup draws = 4000
Group-Level Effects:
~grp2 (Number of levels: 2)
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sd(Intercept) 1.64 1.45 0.15 5.69 1.00 549 932
Population-Level Effects:
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
Intercept -2.21 1.24 -3.89 0.78 1.00 611 752
Draws were sampled using sample(hmc). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).
We can also look at the variable names brms uses internally
as_draws_rvars(fit_bin2)
# A draws_rvars: 1000 iterations, 4 chains, and 5 variables
$b_Intercept: rvar<1000,4>[1] mean ± sd:
[1] -2.2 ± 1.2
$sd_grp2__Intercept: rvar<1000,4>[1] mean ± sd:
[1] 1.6 ± 1.4
$r_grp2: rvar<1000,4>[2,1] mean ± sd:
Intercept
control -0.6 ± 1.2
treatment -1.2 ± 1.3
$lprior: rvar<1000,4>[1] mean ± sd:
[1] -4.2 ± 0.71
$lp__: rvar<1000,4>[1] mean ± sd:
[1] -13 ± 1.8
Although there is no difference, illustrate how to compute the oddsratio from hierarchical model
draws_bin2 <- as_draws_df(fit_bin2)
oddsratio <- draws_bin2 |>
mutate_variables(theta_control = plogis(b_Intercept + `r_grp2[control,Intercept]`),
theta_treatment = plogis(b_Intercept + `r_grp2[treatment,Intercept]`),
oddsratio = (theta_treatment/(1-theta_treatment))/(theta_control/(1-theta_control))) |>
subset_draws(variable='oddsratio')
oddsratio |> mcmc_hist() +
scale_x_continuous(breaks=seq(0.2,1.6,by=0.2))+
geom_vline(xintercept=1, linetype='dashed')
Make also prior sensitivity analysis with focus on oddsratio.
powerscale_sensitivity(fit_bin2, prediction = \(x, ...) oddsratio, num_args=list(digits=2)
)$sensitivity |>
filter(variable=='oddsratio') |>
mutate(across(where(is.double), ~num(.x, digits=2)))
# A tibble: 1 × 4
variable prior likelihood diagnosis
<chr> <num:.2!> <num:.2!> <chr>
1 oddsratio 0.01 0.14 -
Use the Kilpisjärvi summer month temperatures 1952–2022 data from aaltobda package
load(url('https://github.com/avehtari/BDA_course_Aalto/raw/master/rpackage/data/kilpisjarvi2022.rda'))
data_lin <- data.frame(year = kilpisjarvi2022$year,
temp = kilpisjarvi2022$temp.summer)
Plot the data
data_lin |>
ggplot(aes(year, temp)) +
geom_point(color=2) +
labs(x= "Year", y = 'Summer temp. @Kilpisjärvi') +
guides(linetype = "none")
To analyse has there been change in the average summer month temperature we use a linear model with Gaussian model for the unexplained variation. By default brms uses uniform prior for the coefficients.
Formula temp ~ year corresponds to model \(\mathrm{temp} ~ \mathrm{normal}(\alpha + \beta \times \mathrm{temp}, \sigma). The model could also be defined as `temp ~ 1 + year` which explicitly shows the intercept (\)$) part. Using the variable names brms uses the model can be written also as temp ~ normal(b_Intercept*1 + b_year*year, sigma). We start with the default priors to see some tricks that brms does behind the curtain.
fit_lin <- brm(temp ~ year, data = data_lin, family = gaussian(),
seed = SEED, refresh = 0)
Check the summary of the posterior and inference diagnostics.
fit_lin
Family: gaussian
Links: mu = identity; sigma = identity
Formula: temp ~ year
Data: data_lin (Number of observations: 71)
Draws: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
total post-warmup draws = 4000
Population-Level Effects:
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
Intercept -34.69 12.49 -58.73 -10.19 1.00 3995 3035
year 0.02 0.01 0.01 0.03 1.00 3996 3035
Family Specific Parameters:
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sigma 1.08 0.09 0.91 1.28 1.00 3057 3011
Draws were sampled using sample(hmc). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).
Convergence diagnostics look good. We see that posterior mean of Intercept is -34.7, which may sound strange, but that is the intercept at year 0, that is, very far from the data range, and thus doesn’t have meaningful interpretation directly. The posterior mean of year coefficient is 0.02, that is, we estimate that the summer temperature is increasing 0.02°C per year (which would make 1°C in 50 years).
We can check \(R^2\) which corresponds to the proporion of variance explained by the model. The linear model explains 0.16=16% of the total data variance.
bayes_R2(fit_lin) |> round(2)
Estimate Est.Error Q2.5 Q97.5
R2 0.16 0.07 0.03 0.3
We can check the all the priors used.
prior_summary(fit_lin)
prior class coef group resp dpar nlpar lb ub source
(flat) b default
(flat) b year (vectorized)
student_t(3, 9.5, 2.5) Intercept default
student_t(3, 0, 2.5) sigma 0 default
We see that class=b and coef=year have flat, that is, improper uniform prior, Intercept has student_t(3, 9.5, 2.5), and sigma has student_t(3, 0, 2.5) prior. In general it is good to use proper priors, but sometimes flat priors are fine and produce proper posterior (like in this case). Important part here is that by default, brms sets the prior on Intercept after centering the covariate values (design matrix). In this case, brms uses temp - mean(temp) = temp - 1987 instead of original years. This in general improves the sampling efficiency. As the Intercept is now defined at the middle of the data, the default Intercept prior is centered on median of the target (here target is year). If we would like to set informative priors, we need to set the informative prior on Intercept given the centered covariate values. We can turn of the centering by setting argument center=FALSE, and we can set the prior on original intercept by using a formula temp ~ 0 + Intercept + year. In this case, we are happy with the default prior for the intercept. In this specific casse, the flat prior on coefficient is also fine, but we add an weakly informative prior just for the illustration. Let’s assume we expect the temperature to change less than 1°C in 10 years. With student_t(3, 0, 0.03) about 95% prior mass has less than 0.1°C change in year, and with low degrees of freedom (3) we have thick tails making the likelihood dominate in case of prior-data conflict. In real life, we do have much more information about the temperature change, and naturally a hierarchical spatio-temporal model with all temperature measurement locations would be even better.
fit_lin <- brm(temp ~ year, data = data_lin, family = gaussian(),
prior = prior(student_t(3, 0, 0.03), class='b'),
seed = SEED, refresh = 0)
Check the summary of the posterior and inference diagnostics.
fit_lin
Family: gaussian
Links: mu = identity; sigma = identity
Formula: temp ~ year
Data: data_lin (Number of observations: 71)
Draws: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
total post-warmup draws = 4000
Population-Level Effects:
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
Intercept -32.54 12.28 -56.70 -9.01 1.00 4183 3259
year 0.02 0.01 0.01 0.03 1.00 4182 3259
Family Specific Parameters:
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sigma 1.08 0.09 0.92 1.27 1.00 3494 2709
Draws were sampled using sample(hmc). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).
Make prior sensitivity analysis by power-scaling both prior and likelihood.
powerscale_sensitivity(fit_lin)$sensitivity |>
mutate(across(where(is.double), ~num(.x, digits=2)))
# A tibble: 3 × 4
variable prior likelihood diagnosis
<chr> <num:.2!> <num:.2!> <chr>
1 b_Intercept 0.03 0.09 -
2 b_year 0.03 0.09 -
3 sigma 0.00 0.13 -
Our weakly informative proper prior has negligible sensitivity, and the likelihood is informative. Extract the posterior draws and check the summaries
draws_lin <- as_draws_df(fit_lin)
draws_lin |> summarise_draws()
# A tibble: 5 × 10
variable mean median sd mad q5 q95 rhat ess_bulk
<chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 b_Intercept -3.25e+1 -3.24e+1 1.23e+1 1.24e+1 -5.29e+1 -1.29e+1 1.00 4183.
2 b_year 2.11e-2 2.11e-2 6.18e-3 6.22e-3 1.12e-2 3.14e-2 1.00 4182.
3 sigma 1.08e+0 1.07e+0 9.14e-2 9.08e-2 9.43e-1 1.24e+0 1.00 3494.
4 lprior -1.08e+0 -1.06e+0 1.65e-1 1.65e-1 -1.38e+0 -8.51e-1 1.00 4173.
5 lp__ -1.07e+2 -1.06e+2 1.21e+0 9.72e-1 -1.09e+2 -1.05e+2 1.00 1899.
# ℹ 1 more variable: ess_tail <dbl>
If one of the columns is hidden we can force printing all columns
draws_lin |> summarise_draws() |> print(width=Inf)
# A tibble: 5 × 10
variable mean median sd mad q5 q95 rhat
<chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 b_Intercept -32.5 -32.4 12.3 12.4 -52.9 -12.9 1.00
2 b_year 0.0211 0.0211 0.00618 0.00622 0.0112 0.0314 1.00
3 sigma 1.08 1.07 0.0914 0.0908 0.943 1.24 1.00
4 lprior -1.08 -1.06 0.165 0.165 -1.38 -0.851 1.00
5 lp__ -107. -106. 1.21 0.972 -109. -105. 1.00
ess_bulk ess_tail
<dbl> <dbl>
1 4183. 3259.
2 4182. 3259.
3 3494. 2709.
4 4173. 3285.
5 1899. 2576.
Histogram of b_year
draws_lin |>
mcmc_hist(pars='b_year') +
xlab('Average temperature increase per year')
Probability that the coefficient b_year > 0 and the corresponding MCSE
draws_lin |>
mutate(I_b_year_gt_0 = b_year>0) |>
subset_draws(variable='I_b_year_gt_0') |>
summarise_draws(mean, mcse_mean)
# A tibble: 1 × 3
variable mean mcse_mean
<chr> <dbl> <dbl>
1 I_b_year_gt_0 1 NA
All posterior draws have b_year>0, the probability gets rounded to 1, and MCSE is not available as the obserevd posterior variance is 0.
95% posterior interval for temperature increase per 100 years
draws_lin |>
mutate(b_year_100 = b_year*100) |>
subset_draws(variable='b_year_100') |>
summarise_draws(~quantile(.x, probs = c(0.025, 0.975)),
~mcse_quantile(.x, probs = c(0.025, 0.975)),
.num_args = list(digits = 2, notation = "dec"))
# A tibble: 1 × 5
variable `2.5%` `97.5%` mcse_q2.5 mcse_q97.5
<chr> <dbl> <dbl> <dbl> <dbl>
1 b_year_100 0.93 3.33 0.03 0.03
Plot posterior draws of the linear function values at each year. add_linpred_draws() takes the years from the data and uses fit_lin to make the predictions.
data_lin |>
add_linpred_draws(fit_lin) |>
# plot data
ggplot(aes(x=year, y=temp)) +
geom_point(color=2) +
# plot lineribbon for the linear model
stat_lineribbon(aes(y = .linpred), .width = c(.95), alpha = 1/2, color=brewer.pal(5, "Blues")[[5]]) +
# decoration
scale_fill_brewer()+
labs(x= "Year", y = 'Summer temp. @Kilpisjärvi') +
theme(legend.position="none")+
scale_x_continuous(breaks=seq(1950,2020,by=10))
Alternativelly plot a spaghetti plot for 100 draws
data_lin |>
add_linpred_draws(fit_lin, ndraws=100) |>
# plot data
ggplot(aes(x=year, y=temp)) +
geom_point(color=2) +
# plot a line for each posterior draw
geom_line(aes(y=.linpred, group=.draw), alpha = 1/2, color = brewer.pal(5, "Blues")[[3]])+
# decoration
scale_fill_brewer()+
labs(x= "Year", y = 'Summer temp. @Kilpisjärvi') +
theme(legend.position="none")+
scale_x_continuous(breaks=seq(1950,2020,by=10))
Plot posterior predictive distribution at each year until 2030 add_predicted_draws() takes the years from the data and uses fit_lin to make the predictions.
data_lin |>
add_row(year=2023:2030) |>
add_predicted_draws(fit_lin) |>
# plot data
ggplot(aes(x=year, y=temp)) +
geom_point(color=2) +
# plot lineribbon for the linear model
stat_lineribbon(aes(y = .prediction), .width = c(.95), alpha = 1/2, color=brewer.pal(5, "Blues")[[5]]) +
# decoration
scale_fill_brewer()+
labs(x= "Year", y = 'Summer temp. @Kilpisjärvi') +
theme(legend.position="none")+
scale_x_continuous(breaks=seq(1950,2030,by=10))
Warning: Removed 32000 rows containing missing values (`geom_point()`).
Posterior predictive check with density overlays examines the whole temperature distribution
pp_check(fit_lin, type='dens_overlay', ndraws=20)
LOO-PIT check is good for checking whether the normal distribution is well describing the variation as it is examines the calibration of LOO predictive distributions conditonally on each year. LOO-PIT ploty looks good.
pp_check(fit_lin, type='loo_pit_qq', ndraws=4000)
The temperatures used in the above analyses are averages over three months, which makes it more likely that they are normally distributed, but there can be extreme events in the feather and we can check whether more robust Student’s \(t\) observation model would give different results (although LOO-PIT check did already indicate that the normal would be good).
fit_lin_t <- brm(temp ~ year, data = data_lin, family = student(),
prior = prior(student_t(3, 0, 0.03), class='b'),
seed = SEED, refresh = 0)
Check the summary of the posterior and inference diagnostics. The b_year posterior looks similar as before and the posterior for degrees of freedom nu has most of the posterior mass for quite large values indicating there is no strong support for thick tailed variation in average summer temperatures.
fit_lin_t
Family: student
Links: mu = identity; sigma = identity; nu = identity
Formula: temp ~ year
Data: data_lin (Number of observations: 71)
Draws: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
total post-warmup draws = 4000
Population-Level Effects:
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
Intercept -34.01 12.27 -58.50 -9.31 1.00 3979 2893
year 0.02 0.01 0.01 0.03 1.00 3979 2923
Family Specific Parameters:
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sigma 1.03 0.10 0.86 1.24 1.00 3209 2302
nu 24.54 14.36 6.36 60.80 1.00 2972 2325
Draws were sampled using sample(hmc). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).
We can use leave-one-out cross-validation to compare the expected predictive performance.
LOO comparison shows normal and Student’s \(t\) model have similar performance.
loo_compare(loo(fit_lin), loo(fit_lin_t))
elpd_diff se_diff
fit_lin 0.0 0.0
fit_lin_t -0.4 0.3
Heteroskedasticity assumes that the variation around the linear mean can also vary. We can allow sigma to depend on year, too. Although the additional component is written as sigma ~ year, the log link function is used and the model is for log(sigma). bf() allows listing several formulas.
fit_lin_h <- brm(bf(temp ~ year,
sigma ~ year),
data = data_lin, family = gaussian(),
prior = prior(student_t(3, 0, 0.03), class='b'),
seed = SEED, refresh = 0)
Check the summary of the posterior and inference diagnostics. The b_year posterior looks similar as before. The posterior for sigma_year looks like having mosst of the ma for negative values, indicating decrease in temperature variation around the mean.
fit_lin_h
Family: gaussian
Links: mu = identity; sigma = log
Formula: temp ~ year
sigma ~ year
Data: data_lin (Number of observations: 71)
Draws: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
total post-warmup draws = 4000
Population-Level Effects:
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
Intercept -36.37 12.49 -61.25 -10.49 1.00 3412 2842
sigma_Intercept 19.10 8.69 1.56 35.80 1.00 3818 2899
year 0.02 0.01 0.01 0.04 1.00 3426 2885
sigma_year -0.01 0.00 -0.02 -0.00 1.00 3810 2855
Draws were sampled using sample(hmc). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).
Histogram of b_year and b_sigma_year
as_draws_df(fit_lin_h) |>
mcmc_areas(pars=c('b_year', 'b_sigma_year'))
As log(x) is almost linear when x is close to zero, we can see that the sigma is decreasing about 1% per year (95% interval from 0% to 2%).
Plot posterior predictive distribution at each year until 2030 add_predicted_draws() takes the years from the data and uses fit_lin_h to make the predictions.
data_lin |>
add_row(year=2023:2030) |>
add_predicted_draws(fit_lin_h) |>
# plot data
ggplot(aes(x=year, y=temp)) +
geom_point(color=2) +
# plot lineribbon for the linear model
stat_lineribbon(aes(y = .prediction), .width = c(.95), alpha = 1/2, color=brewer.pal(5, "Blues")[[5]]) +
# decoration
scale_fill_brewer()+
labs(x= "Year", y = 'Summer temp. @Kilpisjärvi') +
theme(legend.position="none")+
scale_x_continuous(breaks=seq(1950,2030,by=10))
Warning: Removed 32000 rows containing missing values (`geom_point()`).
Make prior sensitivity analysis by power-scaling both prior and likelihood.
powerscale_sensitivity(fit_lin_h)$sensitivity |>
mutate(across(where(is.double), ~num(.x, digits=2)))
# A tibble: 4 × 4
variable prior likelihood diagnosis
<chr> <num:.2!> <num:.2!> <chr>
1 b_Intercept 0.03 0.11 -
2 b_sigma_Intercept 0.00 0.10 -
3 b_year 0.03 0.11 -
4 b_sigma_year 0.00 0.11 -
We can use leave-one-out cross-validation to compare the expected predictive performance.
LOO comparison shows homoskedastic normal and heteroskedastic normal models have similar performances.
loo_compare(loo(fit_lin), loo(fit_lin_h))
elpd_diff se_diff
fit_lin_h 0.0 0.0
fit_lin -1.6 1.6
We can test the linearity assumption by using non-linear spline functions, by uing s(year) terms. Sampling is slower as the posterior gets more complex.
fit_spline_h <- brm(bf(temp ~ s(year),
sigma ~ s(year)),
data = data_lin, family = gaussian(),
seed = SEED, refresh = 0)
We get warnings about divergences, and try rerunning with higher adapt_delta, which leads to using smaller step sizes. Often adapt_delta=0.999 leads to very slow sampling, but with this small data, this is not an issue.
fit_spline_h <- update(fit_spline_h, control = list(adapt_delta=0.999))
Check the summary of the posterior and inference diagnostics. We’re not anymore able to make interpretation of the temperature increase based on this summary. For splines, we see prior scales sds for the spline coefficients.
fit_spline_h
Family: gaussian
Links: mu = identity; sigma = log
Formula: temp ~ s(year)
sigma ~ s(year)
Data: data_lin (Number of observations: 71)
Draws: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
total post-warmup draws = 4000
Smooth Terms:
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sds(syear_1) 1.04 0.95 0.04 3.55 1.00 1943 1987
sds(sigma_syear_1) 0.95 0.91 0.03 3.43 1.00 1365 2147
Population-Level Effects:
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
Intercept 9.42 0.13 9.16 9.68 1.00 5309 2967
sigma_Intercept 0.04 0.09 -0.12 0.21 1.00 4293 2896
syear_1 2.85 2.66 -2.76 8.35 1.00 2337 2216
sigma_syear_1 -1.03 2.37 -6.48 3.83 1.00 1895 1241
Draws were sampled using sample(hmc). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).
We can still plot posterior predictive distribution at each year until 2030 add_predicted_draws() takes the years from the data and uses fit_lin_h to make the predictions.
data_lin |>
add_row(year=2023:2030) |>
add_predicted_draws(fit_spline_h) |>
# plot data
ggplot(aes(x=year, y=temp)) +
geom_point(color=2) +
# plot lineribbon for the linear model
stat_lineribbon(aes(y = .prediction), .width = c(.95), alpha = 1/2, color=brewer.pal(5, "Blues")[[5]]) +
# decoration
scale_fill_brewer()+
labs(x= "Year", y = 'Summer temp. @Kilpisjärvi') +
theme(legend.position="none")+
scale_x_continuous(breaks=seq(1950,2030,by=10))
Warning: Removed 32000 rows containing missing values (`geom_point()`).
And we can use leave-one-out cross-validation to compare the expected predictive performance.
LOO comparison shows homoskedastic normal linear and heteroskedastic normal spline models have similar performances. There are not enough observations to make clear difference between the models.
loo_compare(loo(fit_lin), loo(fit_spline_h))
elpd_diff se_diff
fit_spline_h 0.0 0.0
fit_lin -0.7 1.8
For spline and other non-parametric models, we can use predictive estimates and predictions to get interpretable quantities. Let’s examine the difference of estimated average temperature in years 1952 and 2022.
temp_diff <- posterior_epred(fit_spline_h, newdata=filter(data_lin,year==1952|year==2022)) |>
rvar() |>
diff() |>
as_draws_df() |>
set_variables('temp_diff')
temp_diff <- data_lin |>
filter(year==1952|year==2022) |>
add_epred_draws(fit_spline_h) |>
pivot_wider(id_cols=.draw, names_from = year, values_from = .epred) |>
mutate(temp_diff = `2022`-`1952`,
.chain = (.draw - 1) %/% 1000 + 1,
.iteration = (.draw - 1) %% 1000 + 1) |>
as_draws_df() |>
subset_draws(variable='temp_diff')
Posterior distribution for average summer temperature increase from 1952 to 2022
temp_diff |>
mcmc_hist()
95% posterior interval for average summer temperature increase from 1952 to 2022
temp_diff |>
summarise_draws(~quantile(.x, probs = c(0.025, 0.975)),
~mcse_quantile(.x, probs = c(0.025, 0.975)),
.num_args = list(digits = 2, notation = "dec"))
# A tibble: 1 × 5
variable `2.5%` `97.5%` mcse_q2.5 mcse_q97.5
<chr> <dbl> <dbl> <dbl> <dbl>
1 temp_diff 0.51 2.58 0.03 0.03
Make prior sensitivity analysis by power-scaling both prior and likelihood with focus on average summer temperature increase from 1952 to 2022.
powerscale_sensitivity(fit_spline_h, prediction = \(x, ...) temp_diff, num_args=list(digits=2)
)$sensitivity |>
filter(variable=='temp_diff') |>
mutate(across(where(is.double), ~num(.x, digits=2)))
# A tibble: 1 × 4
variable prior likelihood diagnosis
<chr> <num:.2!> <num:.2!> <chr>
1 temp_diff 0.01 0.08 -
Probability that the average summer temperature has increased from 1952 to 2022 is 99.5%.
temp_diff |>
mutate(I_temp_diff_gt_0 = temp_diff>0,
temp_diff = NULL) |>
subset_draws(variable='I_temp_diff_gt_0') |>
summarise_draws(mean, mcse_mean)
# A tibble: 1 × 3
variable mean mcse_mean
<chr> <dbl> <dbl>
1 I_temp_diff_gt_0 0.997 0.000895
Load factory data, which contain 5 quality measurements for each of 6 machines. We’re interested in analysing are the quality differences between the machines.
factory <- read.table(url('https://raw.githubusercontent.com/avehtari/BDA_course_Aalto/master/rpackage/data-raw/factory.txt'))
colnames(factory) <- 1:6
factory
1 2 3 4 5 6
1 83 117 101 105 79 57
2 92 109 93 119 97 92
3 92 114 92 116 103 104
4 46 104 86 102 79 77
5 67 87 67 116 92 100
We pivot the data to long format
factory <- factory |>
pivot_longer(cols = everything(),
names_to = 'machine',
values_to = 'quality')
factory
# A tibble: 30 × 2
machine quality
<chr> <int>
1 1 83
2 2 117
3 3 101
4 4 105
5 5 79
6 6 57
7 1 92
8 2 109
9 3 93
10 4 119
# ℹ 20 more rows
As comparison make also pooled model
fit_pooled <- brm(quality ~ 1, data = factory, refresh=0)
Check the summary of the posterior and inference diagnostics.
fit_pooled
Family: gaussian
Links: mu = identity; sigma = identity
Formula: quality ~ 1
Data: factory (Number of observations: 30)
Draws: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
total post-warmup draws = 4000
Population-Level Effects:
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
Intercept 92.88 3.24 86.49 99.17 1.00 2540 2201
Family Specific Parameters:
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sigma 18.40 2.44 14.40 23.79 1.00 3198 2431
Draws were sampled using sample(hmc). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).
As comparison make also seprate model. To make it completely separate we need to have different sigma for each machine, too.
fit_separate <- brm(bf(quality ~ 0 + machine,
sigma ~ 0 + machine),
data = factory, refresh=0)
Check the summary of the posterior and inference diagnostics.
fit_separate
Family: gaussian
Links: mu = identity; sigma = log
Formula: quality ~ 0 + machine
sigma ~ 0 + machine
Data: factory (Number of observations: 30)
Draws: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
total post-warmup draws = 4000
Population-Level Effects:
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
machine1 76.38 11.18 53.55 100.54 1.00 2371 1909
machine2 106.11 8.85 91.08 121.82 1.00 1381 892
machine3 88.44 10.95 71.84 105.79 1.00 1023 431
machine4 111.57 5.04 101.79 121.61 1.00 2019 1464
machine5 90.13 6.81 77.23 103.60 1.00 2184 1561
machine6 86.14 11.50 62.99 108.31 1.00 2225 1605
sigma_machine1 3.10 0.39 2.44 3.97 1.00 2793 2123
sigma_machine2 2.61 0.41 1.97 3.57 1.00 1558 899
sigma_machine3 2.69 0.42 2.04 3.64 1.00 1408 788
sigma_machine4 2.16 0.41 1.52 3.12 1.00 1976 1301
sigma_machine5 2.50 0.41 1.87 3.43 1.00 2082 1366
sigma_machine6 3.08 0.39 2.44 4.00 1.00 2122 1945
Draws were sampled using sample(hmc). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).
fit_hier <- brm(quality ~ 1 + (1 | machine),
data = factory, refresh = 0)
Check the summary of the posterior and inference diagnostics.
fit_hier
Family: gaussian
Links: mu = identity; sigma = identity
Formula: quality ~ 1 + (1 | machine)
Data: factory (Number of observations: 30)
Draws: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
total post-warmup draws = 4000
Group-Level Effects:
~machine (Number of levels: 6)
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sd(Intercept) 12.68 5.94 2.96 27.27 1.00 1030 1090
Population-Level Effects:
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
Intercept 92.96 5.71 81.53 104.27 1.00 1263 1326
Family Specific Parameters:
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sigma 15.09 2.37 11.29 20.68 1.00 1869 2103
Draws were sampled using sample(hmc). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).
LOO comparison shows the hierarchical model is the best. The differences are small as the number of observations is small and there is a considerable prediction (aleatoric) uncertainty.
loo_compare(loo(fit_pooled), loo(fit_separate), loo(fit_hier))
Warning: Found 3 observations with a pareto_k > 0.7 in model 'fit_separate'. It
is recommended to set 'moment_match = TRUE' in order to perform moment matching
for problematic observations.
elpd_diff se_diff
fit_hier 0.0 0.0
fit_separate -3.2 2.6
fit_pooled -3.9 2.0
Different model posterior distributions for the mean quality. Pooled model ignores the varition between machines. Separate model doesn’t take benefit from the similariy of the machines and has higher uncertainty.
ph <- fit_hier |>
spread_rvars(b_Intercept, r_machine[machine,]) |>
mutate(machine_mean = b_Intercept + r_machine) |>
ggplot(aes(xdist=machine_mean, y=machine)) +
stat_halfeye() +
scale_y_continuous(breaks=1:6) +
labs(x='Quality', y='Machine', title='Hierarchical')
ps <- fit_separate |>
as_draws_df() |>
subset_draws(variable='b_machine', regex=TRUE) |>
set_variables(paste0('b_machine[', 1:6, ']')) |>
as_draws_rvars() |>
spread_rvars(b_machine[machine]) |>
mutate(machine_mean = b_machine) |>
ggplot(aes(xdist=machine_mean, y=machine)) +
stat_halfeye() +
scale_y_continuous(breaks=1:6) +
labs(x='Quality', y='Machine', title='Separate')
pp <- fit_pooled |>
spread_rvars(b_Intercept) |>
mutate(machine_mean = b_Intercept) |>
ggplot(aes(xdist=machine_mean, y=0)) +
stat_halfeye() +
scale_y_continuous(breaks=NULL) +
labs(x='Quality', y='All machines', title='Pooled')
(pp / ps / ph) * xlim(c(50,140))
Warning: Removed 998 rows containing missing values (`geom_slabinterval()`).
Make prior sensitivity analysis by power-scaling both prior and likelihood with focus on mean quality of each machine. We see no prior sensitivity.
machine_mean <- fit_hier |>
as_draws_df() |>
mutate(across(matches('r_machine'), ~ .x - b_Intercept)) |>
subset_draws(variable='r_machine', regex=TRUE) |>
set_variables(paste0('machine_mean[', 1:6, ']'))
powerscale_sensitivity(fit_hier, prediction = \(x, ...) machine_mean, num_args=list(digits=2)
)$sensitivity |>
filter(str_detect(variable,'machine_mean')) |>
mutate(across(where(is.double), ~num(.x, digits=2)))
# A tibble: 6 × 4
variable prior likelihood diagnosis
<chr> <num:.2!> <num:.2!> <chr>
1 machine_mean[1] 0.02 0.10 -
2 machine_mean[2] 0.03 0.07 -
3 machine_mean[3] 0.02 0.04 -
4 machine_mean[4] 0.03 0.10 -
5 machine_mean[5] 0.02 0.04 -
6 machine_mean[6] 0.02 0.06 -
Sorafenib Toxicity Dataset in metadat R package includes results from 13 studies investigating the occurrence of dose limiting toxicities (DLTs) at different doses of Sorafenib.
Load data
load(url('https://github.com/wviechtb/metadat/raw/master/data/dat.ursino2021.rda'))
head(dat.ursino2021)
study year dose events total
1 Awada 2005 100 0 4
2 Awada 2005 200 0 3
3 Awada 2005 300 1 5
4 Awada 2005 400 1 10
5 Awada 2005 600 7 12
6 Awada 2005 800 1 3
Number of patients per study
dat.ursino2021 |>
group_by(study) |>
summarise(N = sum(total)) |>
ggplot(aes(x=N, y=study)) +
geom_col(fill=4) +
labs(x='Number of patients per study', y='Study')
Distribution of doses
dat.ursino2021 |>
ggplot(aes(x=dose)) +
geom_histogram(breaks=seq(50,1050,by=100), fill=4, colour=1) +
labs(x='Dose (mg)', y='Count') +
scale_x_continuous(breaks=seq(100,1000,by=100))
Each study is using \(2--6\) different dose levels. Three studies that include only two dose levels are likelly to provide weak information on slope.
crosstab <- with(dat.ursino2021,table(dose,study))
data.frame(count=colSums(crosstab), study=colnames(crosstab)) |>
ggplot(aes(x=count, y=study)) +
geom_col(fill=4) +
labs(x='Number of dose levels per study', y='Study')
Pooled model assumes all studies have the same dose effect (reminder: ~ dose is equivalent to ~ 1 + dose)
fit_pooled <- brm(events | trials(total) ~ dose,
prior = c(prior(student_t(7, 0, 1.5), class='Intercept'),
prior(normal(0, 1), class='b')),
family=binomial(), data=dat.ursino2021)
Check the summary of the posterior and inference diagnostics.
fit_pooled
Family: binomial
Links: mu = logit
Formula: events | trials(total) ~ dose
Data: dat.ursino2021 (Number of observations: 49)
Draws: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
total post-warmup draws = 4000
Population-Level Effects:
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
Intercept -3.17 0.37 -3.93 -2.46 1.00 1465 1690
dose 0.00 0.00 0.00 0.01 1.00 2921 2417
Draws were sampled using sample(hmc). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).
Dose coefficient seems to be very small. Looking at the posterior, we see that it is positive with high probability.
fit_pooled |>
as_draws() |>
subset_draws(variable='b_dose') |>
summarise_draws(~quantile(.x, probs = c(0.025, 0.975)), ~mcse_quantile(.x, probs = c(0.025, 0.975)))
# A tibble: 1 × 5
variable `2.5%` `97.5%` mcse_q2.5 mcse_q97.5
<chr> <dbl> <dbl> <dbl> <dbl>
1 b_dose 0.00230 0.00520 0.0000262 0.0000351
The dose was reported in mg, and most values are in hundreds. It is often sensible to switch to a scale in which the range of values is closer to unit range. In this case it is natural to use g instead of mg.
dat.ursino2021 <- dat.ursino2021 |>
mutate(doseg = dose/1000)
Fit the pooled model again uing doseg
fit_pooled <- brm(events | trials(total) ~ doseg,
prior = c(prior(student_t(7, 0, 1.5), class='Intercept'),
prior(normal(0, 1), class='b')),
family=binomial(), data=dat.ursino2021)
Check the summary of the posterior and inference diagnostics.
fit_pooled
Family: binomial
Links: mu = logit
Formula: events | trials(total) ~ doseg
Data: dat.ursino2021 (Number of observations: 49)
Draws: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
total post-warmup draws = 4000
Population-Level Effects:
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
Intercept -2.58 0.29 -3.16 -2.03 1.00 2268 2665
doseg 2.41 0.59 1.27 3.55 1.00 2938 2546
Draws were sampled using sample(hmc). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).
Now it is easier to interpret the presented values. Separate model assumes all studies have different dose effect. It would be a bit complicated to set a different prior on study specific intercepts and other coefficients, so we use the same prior for all.
fit_separate <- brm(events | trials(total) ~ 0 + study + doseg:study,
prior=prior(student_t(7, 0, 1.5), class='b'),
family=binomial(), data=dat.ursino2021)
Check the summary of the posterior and inference diagnostics.
fit_separate
Family: binomial
Links: mu = logit
Formula: events | trials(total) ~ 0 + study + doseg:study
Data: dat.ursino2021 (Number of observations: 49)
Draws: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
total post-warmup draws = 4000
Population-Level Effects:
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS
studyAwada -1.68 0.69 -3.14 -0.43 1.00 4698
studyBorthakurMA -2.26 0.91 -4.17 -0.55 1.00 4941
studyBorthakurMB -1.39 0.82 -3.10 0.14 1.00 4251
studyChen -2.23 1.00 -4.34 -0.39 1.00 5939
studyClark -1.65 0.81 -3.41 -0.18 1.00 4620
studyCrumpMA -2.02 0.75 -3.62 -0.70 1.00 5868
studyCrumpMB -1.57 0.71 -3.07 -0.25 1.00 5238
studyFuruse -2.67 0.96 -4.82 -0.99 1.00 4835
studyMiller -1.03 0.50 -2.02 -0.04 1.00 3747
studyMinami -2.26 0.79 -3.94 -0.83 1.00 5252
studyMoore -1.71 0.72 -3.24 -0.41 1.00 4486
studyNabors -1.80 0.90 -3.79 -0.20 1.00 3954
studyStrumberg -1.73 0.63 -3.06 -0.55 1.00 4579
studyAwada:doseg 1.62 1.30 -0.75 4.36 1.00 4634
studyBorthakurMA:doseg -0.07 1.52 -3.03 3.03 1.00 5063
studyBorthakurMB:doseg 0.10 1.43 -2.68 3.03 1.00 4596
studyChen:doseg -0.77 1.76 -4.63 2.32 1.00 6400
studyClark:doseg 1.55 1.42 -1.04 4.59 1.00 4668
studyCrumpMA:doseg -0.32 1.55 -3.54 2.65 1.00 6677
studyCrumpMB:doseg 0.24 1.36 -2.43 2.97 1.00 5786
studyFuruse:doseg -0.49 1.70 -3.95 2.94 1.00 5489
studyMiller:doseg 0.00 1.45 -3.02 2.84 1.00 4094
studyMinami:doseg -0.28 1.49 -3.38 2.58 1.00 5360
studyMoore:doseg 0.54 1.41 -2.11 3.44 1.00 4827
studyNabors:doseg 1.33 1.26 -0.95 4.09 1.00 4099
studyStrumberg:doseg 0.38 1.13 -1.89 2.74 1.00 4219
Tail_ESS
studyAwada 3206
studyBorthakurMA 3152
studyBorthakurMB 2976
studyChen 3211
studyClark 2805
studyCrumpMA 3120
studyCrumpMB 3322
studyFuruse 2678
studyMiller 2219
studyMinami 2903
studyMoore 3102
studyNabors 2520
studyStrumberg 2744
studyAwada:doseg 2931
studyBorthakurMA:doseg 3251
studyBorthakurMB:doseg 2998
studyChen:doseg 2817
studyClark:doseg 2548
studyCrumpMA:doseg 3187
studyCrumpMB:doseg 3137
studyFuruse:doseg 2676
studyMiller:doseg 2458
studyMinami:doseg 2824
studyMoore:doseg 2958
studyNabors:doseg 2473
studyStrumberg:doseg 2667
Draws were sampled using sample(hmc). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).
We build two different hierarchical models. The first one has hierarchical model for the intercept, that is, each study has a parameter telling how much that study differs from the common population intercept.
fit_hier1 <- brm(events | trials(total) ~ doseg + (1 | study),
prior=c(prior(student_t(7, 0, 1.5), class='Intercept'),
prior(normal(0, 1), class='b')),
family=binomial(), data=dat.ursino2021)
The second hierarchical model assumes that also the slope can vary between the studies.
fit_hier2 <- brm(events | trials(total) ~ doseg + (doseg | study),
prior=c(prior(student_t(7, 0, 1.5), class='Intercept'),
prior(normal(0, 1), class='b')),
family=binomial(), data=dat.ursino2021)
We seem some divergences due to highly varying posterior curvature. We repeat the sampling with higher adapt_delta, which adjust the step size to be smaller. Higher adapt_delta makes the computation slower, but that is not an issue in this case. If you get divergences with adapt_delta=0.99, it is likely that even larger values don’t help, and you need to consider different parameterisation, different model, or more informative priors.
fit_hier2 <- update(fit_hier2, control=list(adapt_delta=0.99))
LOO-CV comparison
loo_compare(loo(fit_pooled), loo(fit_separate), loo(fit_hier1), loo(fit_hier2))
Warning: Found 6 observations with a pareto_k > 0.7 in model 'fit_separate'. It
is recommended to set 'moment_match = TRUE' in order to perform moment matching
for problematic observations.
Warning: Found 1 observations with a pareto_k > 0.7 in model 'fit_hier2'. It is
recommended to set 'moment_match = TRUE' in order to perform moment matching
for problematic observations.
elpd_diff se_diff
fit_hier1 0.0 0.0
fit_pooled -0.9 1.7
fit_hier2 -0.9 0.6
fit_separate -15.5 2.9
We get warnings about several Pareto k’s > 0.7 in PSIS-LOO for separate model, but as in that case the LOO-CV estimate is usually overoptimistic and the separate model is the worst, there is no need to use more accurate computation for the separate model.
We get warnings about a few Pareto k’s > 0.7 in PSIS-LOO for both hierarchical models. We can improve the accuracy be running MCMC for these LOO folds. We use add_criterion() function to store the LOO computation results as they take a bit longer now. We get some divergences in case of the second hierarchical model, as leaving out an observation for a study that has only two dose levels is making the posterior having a difficult shape.
fit_hier1 <- add_criterion(fit_hier1, criterion='loo', reloo=TRUE)
fit_hier2 <- add_criterion(fit_hier2, criterion='loo', reloo=TRUE)
We repeat the LOO-CV comparison (without separate model). loo() function is useing the reults added to the fit objects.
loo_compare(loo(fit_pooled), loo(fit_hier1), loo(fit_hier2))
elpd_diff se_diff
fit_hier1 0.0 0.0
fit_hier2 -0.8 0.6
fit_pooled -0.9 1.7
The results did not change much. The first hierarchical model is slightly better than other models, but for predictive purposes there is not much difference (there is high aleatoric uncertainty in the predictions). Adding hiearchical model for the slope, decrased the predictive performance and thus it is likely that there is not enough information about the variation in slopes between studies.
Posterior predictive checking showing the observed and predicted number of events. Rootgram uses square root of counts on y-axis for better scaling. Rootogram is useful for count data when the range of counts is small or moderate.
pp_check(fit_pooled, type = "rootogram") +
labs(title='Pooled model')
pp_check(fit_hier1, type = "rootogram") +
labs(title='Hierarchical model')
pp_check(fit_hier2, type = "rootogram") +
labs(title='Hierarchical model')
We see that the hierarchical models have higher probability for future counts that are bigger than maximum observed count and longer predictive distribution tail. This is natural as uncertainty in the variation between tudies increases predictive uncertainty, too, especially as the number of studies is relatively small.
The population level coefficient posterior given pooled model
plot_posterior_pooled <- mcmc_areas(as_draws_df(fit_pooled), regex_pars='b_doseg') +
geom_vline(xintercept=0, linetype='dashed') +
labs(title='Pooled model')
The population level coefficient posterior given hierarchical model 1
plot_posterior_hier1 <- mcmc_areas(as_draws_df(fit_hier1), regex_pars='b_doseg') +
geom_vline(xintercept=0, linetype='dashed') +
labs(title='Hierarchical model 1')
The population level coefficient posterior given hierarchical model 3
plot_posterior_hier2 <- mcmc_areas(as_draws_df(fit_hier2), regex_pars='b_doseg') +
geom_vline(xintercept=0, linetype='dashed') +
labs(title='Hierarchical model 2')
(plot_posterior_pooled / plot_posterior_hier1 / plot_posterior_hier2) * xlim(c(0,8.5))
Warning: Removed 1 rows containing missing values (`geom_segment()`).
All models agree that the slope is very likely positive. The hierarchical models have more uncertainty, but also higher posterior mean.
When we look at the study specific parameters, we see that the Miller study has slightly higher intercept (leading to higher theta).
(mcmc_areas(as_draws_df(fit_hier1), regex_pars='r_study\\[.*Intercept') +
labs(title='Hierarchical model 1')) /
(mcmc_areas(as_draws_df(fit_hier2), regex_pars='r_study\\[.*Intercept') +
labs(title='Hierarchical model 2'))
There are no clear differences in slopes.
mcmc_areas(as_draws_df(fit_hier2), regex_pars='r_study\\[.*doseg') +
labs(title='Hierarchical model 2')
Based on LOO comparison we could continue with any of the models, but if we want to take into account the unknown possible study variations, it is best to continue with the hierarchical model 2. We could reduce the uncertainty by spending some effort to elicit a more informative priors for the between study variation, by searching open study databses for similar studies. In this example, we skip that and continue with other parts of the workflow.
fit_hier2 |>
powerscale_sequence() |>
powerscale_plot_dens(variables='b_doseg') +
# switch rows and cols
facet_grid(rows=vars(.data$variable),
cols=vars(.data$component)) +
# cleaning
ggtitle(NULL,NULL) +
labs(x='Dose (g) coefficient', y=NULL) +
scale_y_continuous(breaks=NULL) +
theme(axis.line.y=element_blank(),
strip.text.y=element_blank())
Summarise the prior and likelihood sensitivity using cumulative Jensen-Shannon distance focusing on the common population level intercept.
powerscale_sensitivity(fit_hier2, variable='b_doseg'
)$sensitivity |>
mutate(across(where(is.double), ~num(.x, digits=2)))
# A tibble: 1 × 4
variable prior likelihood diagnosis
<chr> <num:.2!> <num:.2!> <chr>
1 b_doseg 0.25 0.17 prior-data conflict
The posterior for the probability of event given certain dose and a new study for hierarchical model 2.
data.frame(study='new',
doseg=seq(0.1,1,by=0.1),
total=1) |>
add_linpred_draws(fit_hier2, transform=TRUE, allow_new_levels=TRUE) |>
ggplot(aes(x=doseg, y=.linpred)) +
stat_lineribbon(.width = c(.95), alpha = 1/2, color=brewer.pal(5, "Blues")[[5]]) +
scale_fill_brewer()+
labs(x= "Dose (g)", y = 'Probability of event', title='Hierarchical model') +
theme(legend.position="none") +
geom_hline(yintercept=0) +
scale_x_continuous(breaks=seq(0.1,1,by=0.1)) +
ylim(c(0,0.15))
Warning: Removed 24617 rows containing missing values (`stat_slabinterval()`).
If we plot individual posterior draws, we see that there is a lot of uncertainty about the overall probability (explained by the variation in Intercept in different studies), but less uncertainty about the slope.
data.frame(study='new',
doseg=seq(0.1,1,by=0.1),
total=1) |>
add_linpred_draws(fit_hier2, transform=TRUE, allow_new_levels=TRUE, ndraws=100) |>
ggplot(aes(x=doseg, y=.linpred)) +
geom_line(aes(group=.draw), alpha = 1/2, color = brewer.pal(5, "Blues")[[3]])+
scale_fill_brewer()+
labs(x= "Dose (g)", y = 'Probability of event') +
theme(legend.position="none") +
geom_hline(yintercept=0) +
scale_x_continuous(breaks=seq(0.1,1,by=0.1))
Studies on Pharmacologic Treatments for Chronic Obstructive Pulmonary Disease includes results from 39 trials examining pharmacologic treatments for chronic obstructive pulmonary disease (COPD).
Load data
load(url('https://github.com/wviechtb/metadat/raw/master/data/dat.baker2009.rda'))
# force character strings to factors for easier ploting
dat.baker2009 <- dat.baker2009 |>
mutate(study = factor(study),
treatment = factor(treatment),
id = factor(id))
Look at six first lines of the data frame
head(dat.baker2009)
study year id treatment exac total
1 Llewellyn-Jones 1996 1996 1 Fluticasone 0 8
2 Llewellyn-Jones 1996 1996 1 Placebo 3 8
3 Boyd 1997 1997 2 Salmeterol 47 229
4 Boyd 1997 1997 2 Placebo 59 227
5 Paggiaro 1998 1998 3 Fluticasone 45 142
6 Paggiaro 1998 1998 3 Placebo 51 139
Total number of patients in each study varies a lot
dat.baker2009 |>
group_by(study) |>
summarise(N = sum(total)) |>
ggplot(aes(x=N, y=study)) +
geom_col(fill=4) +
labs(x='Number of patients per study', y='Study')
None of the treatments is included in every study, and each study includes \(2--4\) treatments.
crosstab <- with(dat.baker2009,table(study, treatment))
#
plot_treatments <- data.frame(number_of_studies=colSums(crosstab), treatment=colnames(crosstab)) |>
ggplot(aes(x=number_of_studies,y=treatment)) +
geom_col(fill=4) +
labs(x='Number of studies with a treatment X', y='Treatment') +
geom_vline(xintercept=nrow(crosstab), linetype='dashed') +
scale_x_continuous(breaks=c(0,10,20,30,39))
#
plot_studies <- data.frame(number_of_treatments=rowSums(crosstab), study=rownames(crosstab)) |>
ggplot(aes(x=number_of_treatments,y=study)) +
geom_col(fill=4) +
labs(x='Number of treatments in a study Y', y='Study') +
geom_vline(xintercept=ncol(crosstab), linetype='dashed') +
scale_x_continuous(breaks=c(0,2,4,6,8))
#
plot_treatments + plot_studies
The first model is pooling the information over studies, but estimating separate theta for each treatment (including placebo).
fit_pooled <- brm(exac | trials(total) ~ 0 + treatment,
prior = prior(student_t(7, 0, 1.5), class='b'),
family=binomial(), data=dat.baker2009)
Check the summary of the posterior and inference diagnostics.
fit_pooled
Family: binomial
Links: mu = logit
Formula: exac | trials(total) ~ 0 + treatment
Data: dat.baker2009 (Number of observations: 94)
Draws: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
total post-warmup draws = 4000
Population-Level Effects:
Estimate Est.Error l-95% CI u-95% CI Rhat
treatmentBudesonide -0.30 0.10 -0.50 -0.12 1.00
treatmentBudesonidePFormoterol -0.49 0.10 -0.68 -0.31 1.00
treatmentFluticasone 0.35 0.04 0.28 0.43 1.00
treatmentFluticasonePSalmeterol 0.12 0.03 0.06 0.18 1.00
treatmentFormoterol -0.71 0.06 -0.83 -0.59 1.00
treatmentPlacebo -0.28 0.02 -0.32 -0.24 1.00
treatmentSalmeterol -0.38 0.03 -0.44 -0.33 1.00
treatmentTiotropium -0.90 0.03 -0.96 -0.84 1.00
Bulk_ESS Tail_ESS
treatmentBudesonide 6266 2832
treatmentBudesonidePFormoterol 6879 3057
treatmentFluticasone 7007 3221
treatmentFluticasonePSalmeterol 7283 2962
treatmentFormoterol 6827 2633
treatmentPlacebo 7126 2947
treatmentSalmeterol 7786 3088
treatmentTiotropium 6762 3076
Draws were sampled using sample(hmc). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).
Treatment effect posteriors
fit_pooled |>
as_draws_df() |>
subset_draws(variable='b_', regex=TRUE) |>
set_variables(paste0('b_treatment[', levels(factor(dat.baker2009$treatment)), ']')) |>
as_draws_rvars() |>
spread_rvars(b_treatment[treatment]) |>
mutate(theta_treatment = rfun(plogis)(b_treatment)) |>
ggplot(aes(xdist=theta_treatment, y=treatment)) +
stat_halfeye() +
labs(x='theta', y='Treatment', title='Pooled over studies, separate over treatments')
Treatment effect odds-ratio posteriors
theta <- fit_pooled |>
as_draws_df() |>
subset_draws(variable='b_', regex=TRUE) |>
set_variables(paste0('b_treatment[', levels(factor(dat.baker2009$treatment)), ']')) |>
as_draws_rvars() |>
spread_rvars(b_treatment[treatment]) |>
mutate(theta_treatment = rfun(plogis)(b_treatment))
theta_placebo <- filter(theta,treatment=='Placebo')$theta_treatment[[1]]
theta |>
mutate(treatment_oddsratio = (theta_treatment/(1-theta_treatment))/(theta_placebo/(1-theta_placebo))) |>
filter(treatment != "Placebo") |>
ggplot(aes(xdist=treatment_oddsratio, y=treatment)) +
stat_halfeye() +
labs(x='Odds-ratio', y='Treatment', title='Pooled over studies, separate over treatments') +
geom_vline(xintercept=1, linetype='dashed')
We see a big variation between treatments and two treatments seem to be harmful, which is suspicious. Looking at the data we see that not all studies included all treatments, and thus if some of the studies had more events, then the above estimates can be wrong.
The target is discrete count, but as the range of counts is big, a rootogram would look messy, and density overlay plot is a better choice. Posterior predictive checking with kernel density estimates for the data and 10 posterior predictive replicates shows clear discrepancy.
pp_check(fit_pooled, type='dens_overlay')
Posterior predictive checking with PIT values and ECDF difference plot with envelope shows clear discrepancy.
pp_check(fit_pooled, type='pit_ecdf', ndraws=4000)
Posterior predictive checking with LOO-PIT values show clear discrepancy.
pp_check(fit_pooled, type='loo_pit_qq', ndraws=4000) +
geom_abline() +
ylim(c(0,1))
Warning: Some Pareto k diagnostic values are too high. See help('pareto-k-diagnostic') for details.
Warning: Removed 9 rows containing missing values (`geom_point()`).
Warning: Removed 2 rows containing missing values (`geom_path()`).
The second model uses a hiearchical model both for treatment effects and study effects.
fit_hier <- brm(exac | trials(total) ~ (1 | treatment) + (1 | study),
family=binomial(), data=dat.baker2009)
Check the summary of the posterior and inference diagnostics.
fit_hier
Family: binomial
Links: mu = logit
Formula: exac | trials(total) ~ (1 | treatment) + (1 | study)
Data: dat.baker2009 (Number of observations: 94)
Draws: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
total post-warmup draws = 4000
Group-Level Effects:
~study (Number of levels: 39)
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sd(Intercept) 1.20 0.17 0.93 1.56 1.02 493 1134
~treatment (Number of levels: 8)
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sd(Intercept) 0.18 0.07 0.09 0.35 1.00 1172 1880
Population-Level Effects:
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
Intercept -0.91 0.21 -1.34 -0.50 1.00 412 747
Draws were sampled using sample(hmc). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).
LOO-CV comparison
loo_compare(loo(fit_pooled), loo(fit_hier))
Warning: Found 22 observations with a pareto_k > 0.7 in model 'fit_pooled'. It
is recommended to set 'moment_match = TRUE' in order to perform moment matching
for problematic observations.
Warning: Found 24 observations with a pareto_k > 0.7 in model 'fit_hier'. It is
recommended to set 'moment_match = TRUE' in order to perform moment matching
for problematic observations.
elpd_diff se_diff
fit_hier 0.0 0.0
fit_pooled -1948.2 302.5
We get warnings about Pareto k’s > 0.7 in PSIS-LOO, but as the difference between the models is huge, we can be confident that the order would the same if we fixed the computation, and the hierarchical model is much better and there is high variation between studies. Clearly there are many highly influential observations.
Posterior predictive checking with kernel density estimates for the data and 10 posterior predictive replicates looks good (although with this many parameters, this check is likely to be optimistic).
pp_check(fit_hier, type='dens_overlay')
Posterior predictive checking with PIT values and ECDF difference plot with envelope looks good (although with this many parameters, this check is likely to be optimistic).
pp_check(fit_hier, type='pit_ecdf', ndraws=4000)
Posterior predictive checking with LOO-PIT values look good (alhough as there are Pareto-khat warnings, it is possible that this diagnostic is optimistic).
pp_check(fit_hier, type='loo_pit_qq', ndraws=4000) +
geom_abline() +
ylim(c(0,1))
Warning: Some Pareto k diagnostic values are too high. See help('pareto-k-diagnostic') for details.
Warning: Removed 3 rows containing missing values (`geom_point()`).
Warning: Removed 2 rows containing missing values (`geom_path()`).
Treatment effect posteriors have now much less variation.
fit_hier |>
spread_rvars(b_Intercept, r_treatment[treatment,]) |>
mutate(theta_treatment = rfun(plogis)(b_Intercept + r_treatment)) |>
ggplot(aes(xdist=theta_treatment, y=treatment)) +
stat_halfeye() +
labs(x='theta', y='Treatment', title='Hierarchical over studies, hierarchical over treatments')
Study effect posteriors show the expected high variation.
fit_hier |>
spread_rvars(b_Intercept, r_study[study,]) |>
mutate(theta_study = rfun(plogis)(b_Intercept + r_study)) |>
ggplot(aes(xdist=theta_study, y=study)) +
stat_halfeye() +
labs(x='theta', y='Study', title='Hierarchical over studies, hierarchical over treatments')
Treatment effect odds-ratio posteriors
theta <- fit_hier |>
spread_rvars(b_Intercept, r_treatment[treatment,]) |>
mutate(theta_treatment = rfun(plogis)(b_Intercept + r_treatment))
theta_placebo <- filter(theta,treatment=='Placebo')$theta_treatment[[1]]
theta |>
mutate(treatment_oddsratio = (theta_treatment/(1-theta_treatment))/(theta_placebo/(1-theta_placebo))) |>
filter(treatment != "Placebo") |>
ggplot(aes(xdist=treatment_oddsratio, y=treatment)) +
stat_halfeye() +
labs(x='Odds-ratio', y='Treatment', title='Hierarchical over studies, hierarchical over treatments') +
geom_vline(xintercept=1, linetype='dashed')
Treatment effect odds-ratios look now more reasonable. As now all treatments were compared to placebo, there is less overlap in the distributions as when looking at the thetas, as all thetas include similar uncertainty about the overall theta due to high variation between studies. The third model includes interaction so that the treatment can depend on study.
fit_hier2 <- brm(exac | trials(total) ~ (1 | treatment) + (treatment | study),
family=binomial(), data=dat.baker2009, control=list(adapt_delta=0.9))
LOO comparison shows
loo_compare(loo(fit_hier), loo(fit_hier2))
Warning: Found 24 observations with a pareto_k > 0.7 in model 'fit_hier'. It is
recommended to set 'moment_match = TRUE' in order to perform moment matching
for problematic observations.
Warning: Found 40 observations with a pareto_k > 0.7 in model 'fit_hier2'. It
is recommended to set 'moment_match = TRUE' in order to perform moment matching
for problematic observations.
elpd_diff se_diff
fit_hier2 0.0 0.0
fit_hier -3.6 3.4
We get warnings about Pareto k’s > 0.7 in PSIS-LOO, but as the models are similar, and the difference is small, we can be relatively confident that the more complex model is not better.